5 research outputs found

    ZASTOSOWANIE ALGORYTMU WYSZUKIWANIA WIELU WZORC脫W OPARTEGO O TECHNIK臉 Q-GRAM脫W DO WYSZUKIWANIA PRZYBLI呕ONEGO

    Get PDF
    We consider the application of multiple pattern matching (Multi AOSO on q-Grams) algorithm for approximate pattern matching. We propose the on-line approach which translates the problem from approximate pattern matching into a multiple pattern one (called partitioning into exact search). Presented solution allows relatively fast search multiple patterns in text with given k-differences(or mismatches). This paper presents comparison of solution based on MAG algorithm, and [4]. Experiments on DNA, English, Proteins and XML texts with up to k errors show that the new proposed algorithm achieves relatively good results in practical use.Rozwa偶amy zastosowanie algorytmu wyszukiwania wielu wzorc贸w (Multi AOSO on q-Grams) do wyszukiwania przybli偶onego. Proponujemy rozwi膮zanie on-line, upraszczaj膮ce problem wyszukiwania przybli偶onego do wyszukiwania wielu wzorc贸w. Zaprezentowane rozwi膮zanie umo偶liwia relatywnie szybko wyszukiwa膰 wiele wzorc贸w dla odleg艂o艣ci Levenshteina (lub Hamminga) z ograniczeniem do k. W artykule por贸wnane jest rozwi膮zanie oparte na algorytmie MAG oraz [4]. Badania eksperymentalne przeprowadzone na zbiorach DNA, English, Proteins and XML z r贸偶nymi warto艣ciami k wykaza艂y, 偶e zaproponowany algorytm osi膮ga relatywnie dobre wyniki w praktycznym zastosowaniu

    A Bloom filter based semi-index on qq-grams

    Full text link
    We present a simple qq-gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. \cite{CNPSTjda10} semi-index at a comparable space usage

    Revisiting Multiple Pattern Matching

    Get PDF
    We consider the classical exact multiple string matching problem. The proposed solution is based on a combination of a few ideas: using q-grams instead of single characters, pattern superimposition, bit-parallelism and alphabet size reduction. We discuss the pros and cons of various alternatives to achieve the possibly best combination of techniques. The main contribution of this paper are different alphabet mapping methods that allow to reduce memory requirements and use larger q-grams. The experimental results show that the presented algorithm is competitive in most practical cases. One of the tests shows also that tailoring our scheme to search over a byte-encoded text results in speedups in comparison to searching over a plain text

    Engineering the counting filter for string matching algorithms

    No full text
    We consider a new approach to the popular counting filter technique for approximate pattern matching. Our solution is based on q-grams combined with alphabet reduction by bin packing and using Streaming SIMD Extensions (SSE). We present a few variants that use the mentioned techniques and discuss pros and cons of them considering two approximate pattern matching problems. The first one is the well- known matching with k-differences and the second one is a biological problem of DNA sequence mutation called matching with inversions and translocations. The experimental results show the effectiveness of our ideas that speed up the counting filter and reduce the number of verifications by orders of magnitude
    corecore